You know how this starts
- Some data, maybe a lot of data
- Maybe a database
- Extra analysis tools
- More computing horsepower
- Before you know it, you end up with…
You know how this starts
Eventually, some of those arrows connect to R.
Moving data around data centers not as big of a deal as you might think.
http://www.eecs.berkeley.edu/~ganesha/disk-irrelevant_hotos2011.pdf
Cache/fast storage reuse is usually more important.
Marshaling those data in and out of misc. data formats is a big deal!
Yuck!
Feather: a common data frame serialization format for R and Python and …
SciDB streaming API
Feathercache: a fast object store interface for R (experimental)
Simple GET/PUT/DELETE interface
High compute/data ratios benefit from elastic compute
Elastic computing with R and Redis on Amazon EC2
doRedis update due to CRAN soon, for now use GitHub
Vignettes
Amazon EC2 Recipe